40 research outputs found

    How Consistent is Relevance Feedback in Exploratory Search?

    Get PDF
    Search activities involving knowledge acquisition, investigation and synthesis are collectively known as exploratory search. Exploratory search is challenging for users, who may be unable to formulate search queries, have ill-defined search goals or may even struggle to understand search results. To ameliorate these difficulties, reinforcement learning-based information retrieval systems were developed to provide adaptive support to users. Reinforcement learning is used to build a model of user intent based on relevance feedback provided by the user. But how reliable is relevance feedback in this context? To answer this question, we developed a novel permutation-based metric for scoring the consistency of relevance feedback. We used this metric to perform a retrospective analysis of interaction data from lookup and exploratory search experiments. Our analysis shows that for lookup search relevance judgments are highly consistent, supporting previous findings that relevance feedback improves retrieval performance. For exploratory search, however, the distribution of consistency scores shows considerable inconsistency.Peer reviewe

    TOPAZ: asymmetric suffix array neighbourhood search for massive protein databases

    Get PDF
    Protein homology search is an important, yet time-consuming, step in everything from protein annotation to metagenomics. Its application, however, has become increasingly challenging, due to the exponential growth of protein databases. In order to perform homology search at the required scale, many methods have been proposed as alternatives to BLAST that make an explicit trade-off between sensitivity and speed. One such method, SANSparallel, uses a parallel implementation of the suffix array neighbourhood search (SANS) technique to achieve high speed and provides several modes to allow for greater sensitivity at the expense of performance.Peer reviewe

    Lexical ambiguity detection in professional discourse

    Get PDF
    Publisher Copyright: © 2022 The Author(s)Professional discourse is the language used by specialists, such as lawyers, doctors and academics, to communicate the knowledge and assumptions associated with their respective fields. Professional discourse can be especially difficult for non-specialists to understand due to the lexical ambiguity of commonplace words that have a different or more specific meaning within a specialist domain. This phenomena also makes it harder for specialists to communicate with the general public because they are similarly unaware of the potential for misunderstandings. In this article, we present an approach for detecting domain terms with lexical ambiguity versus everyday English. We demonstrate the efficacy of our approach with three case studies in statistics, law and biomedicine. In all case studies, we identify domain terms with a precision@100 greater than 0.9, outperforming the best performing baseline by 18.1–91.7%. Most importantly, we show this ranking is broadly consistent with semantic differences. Our results highlight the difficulties that existing semantic difference methods have in the cross-domain setting, which rank non-domain terms highly due to noise or biases in the data. We additionally show that our approach generalizes to short phrases and investigate its data efficiency by varying the number of labeled examples.Peer reviewe

    How Relevance Feedback is Framed Affects User Experience, but not Behaviour

    Get PDF
    Retrieval systems based on machine learning require both positive and negative examples to perform inference, which is usually obtained through relevance feedback. Unfortunately, explicit negative relevance feedback is thought to have poor user experience. Instead, systems typically rely on implicit negative feedback. In this study, we confirm that, in the case of binary relevance feedback, users prefer giving positive feedback ( and implicit negative feedback) over negative feedback ( and implicit positive feedback). These two feedback mechanisms are functionally equivalent, capturing the same information from the user, but differ in how they are framed. Despite users' preference for positive feedback, there were no significant differences in behaviour. As users were not shown how feedback influenced search results, we hypothesise that previously reported results could, at least in part, be due to cognitive biases related to user perception of negative feedback.Peer reviewe

    Can Language Models Identify Wikipedia Articles with Readability and Style Issues?

    Get PDF
    Wikipedia is frequently criticised for having poor readability and style issues. In this article, we investigate using GPT-2, a neural language model, to identify poorly written text in Wikipedia by ranking documents by their perplexity. We evaluated the properties of this ranking using human assessments of text quality, including readability, narrativity and language use. We demonstrate that GPT-2 perplexity scores correlate moderately to strongly with narrativity, but only weakly with reading comprehension scores. Importantly, the model reflects even small improvements to text as would be seen in Wikipedia edits. We conclude by highlighting that Wikipedia's featured articles counter-intuitively contain text with the highest perplexity scores. However, these examples highlight many of the complexities that need to be resolved for such an approach to be used in practice.Peer reviewe

    Statistically Significant Detection of Semantic Shifts using Contextual Word Embeddings

    Get PDF
    Detecting lexical semantic change in smaller data sets, e.g. in historical linguistics and digital humanities, is challenging due to a lack of statistical power. This issue is exacerbated by non-contextual embedding models that produce one embedding per word and, therefore, mask the variability present in the data. In this article, we propose an approach to estimate semantic shift by combining contextual word embeddings with permutation-based statistical tests. We use the false discovery rate procedure to address the large number of hypothesis tests being conducted simultaneously. We demonstrate the performance of this approach in simulation where it achieves consistently high precision by suppressing false positives. We additionally analyze real-world data from SemEval-2020 Task 1 and the Liverpool FC subreddit corpus. We show that by taking sample variation into account, we can improve the robustness of individual semantic shift estimates without degrading overall performance.Peer reviewe

    Critiquing-based Modeling of Subjective Preferences

    Get PDF
    Funding Information: This work has been supported by Helsinki Institute for Information Technology HIIT. Publisher Copyright: © 2022 ACM.Applications designed for entertainment and other non-instrumental purposes are challenging to optimize because the relationships between system parameters and user experience can be unclear. Ideally, we would crowdsource these design questions, but existing approaches are geared towards evaluation or ranking discrete choices and not for optimizing over continuous parameter spaces. In addition, users are accustomed to informally expressing opinions about experiences as critiques (e.g. it's too cold, too spicy, too big), rather than giving precise feedback as an optimization algorithm would require. Unfortunately, it can be difficult to analyze qualitative feedback, especially in the context of quantitative modeling. In this article, we present collective criticism, a critiquing-based approach for modeling relationships between system parameters and subjective preferences. We transform critiques, such as "it was too easy/too challenging", into censored intervals and analyze them using interval regression. Collective criticism has several advantages over other approaches: "too much/too little"-style feedback is intuitive for users and allows us to build predictive models for the optimal parameterization of the variables being critiqued. We present two studies where we model: These studies demonstrate the flexibility of our approach, and show that it produces robust results that are straightforward to interpret and inline with users' stated preferences.Peer reviewe

    Holes in the Outline : Subject-dependent Abstract Quality and its Implications for Scientific Literature Search

    Get PDF
    Scientific literature search engines typically index abstracts instead of the full-text of publications. The expectation is that the abstract provides a comprehensive summary of the article, enumerating key points for the reader to assess whether their information needs could be satisfied by reading the full-text. Furthermore, from a practical standpoint, obtaining the full-text is more complicated due to licensing issues, in the case of commercial publishers, and resource limitations of public repositories and pre-print servers. In this article, we use topic modelling to represent content in abstracts and full-text articles. Using Computer Science as a case study, we demonstrate that how well the abstract summarises the full-text is subfield-dependent. Indeed, we show that abstract representativeness has a direct impact on retrieval performance, with poorer abstracts leading to degraded performance. Finally, we present evidence that how well an abstract represents the full-text of an article is not random, but is a consequence of style and writing conventions in different subdisciplines and can be used to infer an "evolutionary" tree of subfields within Computer Science.Peer reviewe
    corecore